Authorship Attribution and Author Profiling of Lithuanian Literary Texts

نویسندگان

  • Jurgita Kapociute-Dzikiene
  • Andrius Utka
  • Ligita Sarkute
چکیده

In this work we are solving authorship attribution and author profiling tasks (by focusing on the age and gender dimensions) for the Lithuanian language. This paper reports the first results on literary texts, which we compared to the results, previously obtained with different functional styles and language types (i.e., parliamentary transcripts and forum posts). Using the Naïve Bayes Multinomial and Support Vector Machine methods we investigated an impact of various stylistic, character, lexical, morpho-syntactic features, and their combinations; the different author set sizes of 3, 5, 10, 20, 50, and 100 candidate authors; and the dataset sizes of 100, 300, 500, 1,000, 2,000, and 5,000 instances in each class. The highest 89.2% accuracy in the authorship attribution task using a maximum number of candidate authors was achieved with the Naïve Bayes Multinomial method and document-level character tri-grams. The highest 78.3% accuracy in the author profiling task focusing on the age dimension was achieved with the Support Vector Machine method and token lemmas. An accuracy reached 100% in the author profiling task focusing on the gender dimension with the Naïve Bayes Multinomial method and rather small datasets, where various lexical, morpho-syntactic, and character feature types demonstrated a very similar performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Author Set Size in Authorship Attribution for Lithuanian

This paper reports the first authorship attribution results based on the effect of the author set size using automatic computational methods for the Lithuanian language. The aim is to determine how fast authorship attribution results are deteriorating while the number of candidate authors is gradually increasing: i.e. starting from 3, going up to 5, 10, 20, 50, and 100. Using supervised machine...

متن کامل

Text Classification For Authorship Attribution Analysis

Authorship attribution mainly deals with undecided authorship of literary texts. Authorship attribution is useful in resolving issues like uncertain authorship, recognize authorship of unknown texts, spot plagiarism so on. Statistical methods can be used to set apart the approach of an author numerically. The basic methodologies that are made use in computational stylometry are word length, sen...

متن کامل

Explaining Delta, or: How do distance measures for authorship attribution work?

Authorship Attribution is a research area in quantitative text analysis concerned with attributing texts of unknown or disputed authorship to their actual author based on quantitatively measured linguistic evidence (see Juola 2006; Stamatatos 2009; Koppel et al. 2009). Authorship attribution has applications in literary studies, history, forensics and many other fields, e.g. corpus stylistics (...

متن کامل

لوایح از کیست؟

Lavâyeh is one of the most famous Persian mystical texts. The treatise has been written following the example of Sheikh Ahmad Ghazzāli’s Savāneh. Devotion of ‘Eyn al-Qozāt of Hamadān to the author of Savāneh and relation of them with each other, and some other factors, caused that Lavāyeh be known by the name of ‘Eyn al-Qozāt’s, and even once has been published by his name in Iran. This untru...

متن کامل

A Deep Context Grammatical Model For Authorship Attribution

We define a variable-order Markov model, representing a Probabilistic Context Free Grammar, built from the sentence-level, delexicalized parse of source texts generated by a standard lexicalized parser, which we apply to the authorship attribution task. First, we motivate this model in the context of previous research on syntactic features in the area, outlining some of the general strengths an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015